optimize bandpower window integration #80

cmbant · 2024-07-03T11:45:29Z

At least for the moment, the weights are mostly zero since the bandpowers localized, so all those elements of the matrix multiplication are not needed. This refactor speeds it up by about a factor of about 8 on my laptop, hopefully enough to make dragging sampling useful. It could of course be further improved by putting the whole operation into numba with openmp.

Now the remaining computing time is split roughly equally between _get_foreground_model and the rest. To see where time was spent I did

from io import StringIO
import pstats
from cProfile import Profile
prof = Profile()
prof.enable()
nuis_params['a_tSZ']*=1.00001 # make sure no cache on fast params
loglikes, derived = model.loglikes(nuis_params)
prof.disable()
s = StringIO()
ps = pstats.Stats(prof, stream=s)
ps.strip_dirs()
ps.sort_stats('time')
ps.print_stats(30)
ps.sort_stats('cumtime')
ps.print_stats(30)
print(s.getvalue())

codecov-commenter · 2024-07-03T11:48:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@660e0fa). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff            @@
##             master      #80   +/-   ##
=========================================
  Coverage          ?   87.59%           
=========================================
  Files             ?        3           
  Lines             ?      395           
  Branches          ?        0           
=========================================
  Hits              ?      346           
  Misses            ?       49           
  Partials          ?        0

Files	Coverage Δ
mflike/mflike.py	`90.40% <100.00%> (ø)`

xgarrido · 2024-07-05T12:24:53Z

Really looks good to me. I did the time benchmark as suggested and I'm getting same time factors.

One remark a bit uncorrelated to this PR, I had a look into the _fast_chi_square function of cobaya which, in most of the time, fall down to camb.mathutils.chi_squared (in case camb is installed) but in case camb is not installed the default way to compute $\chi^2$ is https://github.com/CobayaSampler/cobaya/blob/master/cobaya/likelihoods/base_classes/DataSetLikelihood.py#L27 Regarding time benchmark, we noticed sometimes ago that dot product was slightly slower than @ product introduced in python 3.5. So we might save few microseconds in case someone use mflike (or any likelihood relying to _fast_chi_square) without camb installed.

cmbant · 2024-07-05T12:34:08Z

Interesting, I had assumed they mapped to identical code. If I do this in jupyter (for n=9000) if anything the .dot form seems to be faster for me

timeit v @ cov @ v
%timeit cov.dot(v).dot(v)
%timeit v @ cov @ v
%timeit cov.dot(v).dot(v)

11.4 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.4 ms ± 63.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CAMB's gets the extra factor of 2 from symmetry.

xgarrido · 2024-07-05T13:21:27Z

I also thought they map to the same piece of code. In your test, the cov matrix is very sparse (so the purpose of this PR). If you do the test on the same 9000x9000 matrix with random numbers, then the @ is much faster than dot. I'm not sure I understand the difference between dot and matmul which corresponds to the @ implementation https://numpy.org/doc/stable/reference/generated/numpy.matmul.html#numpy.matmul

cmbant · 2024-07-05T13:51:02Z

odd, I see the same with random numbers and matmul or @ (Maybe it changed in numpy 2 or depends on threading?)

optimize bandpower window integration

fd46540

cmbant mentioned this pull request Jul 26, 2024

MFLike simonsobs/SOLikeT#183

Closed

cmbant merged commit fd46540 into master Sep 4, 2024
11 checks passed

cmbant deleted the optimize branch September 4, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize bandpower window integration #80

optimize bandpower window integration #80

cmbant commented Jul 3, 2024

codecov-commenter commented Jul 3, 2024

xgarrido commented Jul 5, 2024

cmbant commented Jul 5, 2024 •

edited

Loading

xgarrido commented Jul 5, 2024

cmbant commented Jul 5, 2024

optimize bandpower window integration #80

optimize bandpower window integration #80

Conversation

cmbant commented Jul 3, 2024

codecov-commenter commented Jul 3, 2024

Codecov Report

xgarrido commented Jul 5, 2024

cmbant commented Jul 5, 2024 • edited Loading

xgarrido commented Jul 5, 2024

cmbant commented Jul 5, 2024

cmbant commented Jul 5, 2024 •

edited

Loading